A Linear Size Index for Approximate Pattern Matching

نویسندگان

Ho-Leung Chan

Tak Wah Lam

Wing-Kin Sung

Siu-Lung Tam

Swee-Seong Wong

چکیده

This paper revisits the problem of indexing a text S[1..n] to support searching substrings in S that match a given pattern P [1..m] with at most k errors. A naive solution either has a worst-case matching time complexity of Ω(m) or requires Ω(n) space. Devising a solution with better performance has been a challenge until Cole et al. [5] showed an O(n log n)-space index that can support k-error matching in O(m+occ+log n log logn) time, where occ is the number of occurrences. Motivated by the indexing of DNA, we investigate in this paper the feasibility of devising a linear-size index that still has a time complexity linear in m. In particular, we give an O(n)-space index that supports k-error matching in O(m+ occ+ (logn) log logn) worst-case time. Furthermore, the index can be compressed from O(n) words into O(n) bits with a slight increase in the time complexity.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parameterized matching on non-linear structures

The classical pattern matching paradigm is that of seeking occurrences of one string in another, where both strings are drawn from an alphabet set Σ. In the parameterized pattern matching model, a consistent renaming of symbols from Σ is allowed in a match. The parameterized matching paradigm has proven useful in problems in software engineering, computer vision, and other applications. In clas...

متن کامل

Indexes for Jumbled Pattern Matching in Strings, Trees and Graphs

We consider how to index strings, trees and graphs for jumbled pattern matching when we are asked to return a match if one exists. For example, we show how, given a tree containing two colours, we can build a quadratic-space index with which we can find a match in time proportional to the size of the match. We also show how we need only linear space if we are content with approximate matches.

متن کامل

An Index for Two Dimensional String Matching Allowing Rotations

We present an index to search a two-dimensional pattern of size m × m in a two-dimensional text of size n × n, even when the pattern appears rotated in the text. The index is based on (path compressed) tries. By using O(n) (i.e. linear) space the index can search the pattern in O((logσ n) ) time on average, where σ is the alphabet size. We also consider various schemes for approximate matching,...

متن کامل

FAMOUS: Fast Approximate string Matching using OptimUm search Schemes

Finding approximate occurrences of a pattern in a text using a full-text index is a central problem in bioinformatics and has been extensively researched. The introduction of practical bidirectional indices has opened new possibilities for solving the problem as they allow the search to be started from anywhere within the pattern and extended in both directions. In particular, use of search sch...

متن کامل

A Hybrid Indexing Method for Approximate String Matching

We present a new indexing method for the approximate string matching problem. The method is based on a suffix array combined with a partitioning of the pattern. We analyze the resulting algorithm and show that the average retrieval time is , for some that depends on the error fraction tolerated and the alphabet size . It is shown that for approximately , where . The space required is four times...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2006

A Linear Size Index for Approximate Pattern Matching

نویسندگان

چکیده

منابع مشابه

Parameterized matching on non-linear structures

Indexes for Jumbled Pattern Matching in Strings, Trees and Graphs

An Index for Two Dimensional String Matching Allowing Rotations

FAMOUS: Fast Approximate string Matching using OptimUm search Schemes

A Hybrid Indexing Method for Approximate String Matching

عنوان ژورنال:

اشتراک گذاری